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Abstract 

Testing rare variants directly is possible with next-generation sequencing technology. In this article, we propose a 
sliding-window-based optimal-weighted approach to test for the effects of both rare and common variants across 
the whole genome. We measured the genetic association between a disease and a combination of variants of a 
single-nucleotide polymorphism window using the newly developed tests TOW and VW-TOW and performed a 
sliding-window technique to detect disease-susceptible windows. By applying the new approach to unrelated 
individuals of Genetic Analysis Workshop 18 on replicate 1 chromosome 3, we detected 3 highly susceptible 
windows across chromosome 3 for diastolic blood pressure and identified 10 of 48,176 windows as the most 
promising for both diastolic and systolic blood pressure. Seven of 9 top variants influencing diastolic blood 
pressure and 8 of 9 top variants influencing systolic blood pressure were found in or close to our top 10 windows. 



Background 

Hypertension is a common chronic destructive disease 
with unknown complex etiology [1]. More thanlbillion 
people worldwide have hypertension, defined as blood 
pressure (BP) >140 mm Hg systolic (SBP) or >90 mm Hg 
diastolic (DBP) [2], which is a major risk factor for stroke, 
myocardial infarction, heart failure, and a cause of 
chronic kidney disease [3-5]. Both genetic and environ- 
mental bases are likely to contribute to this disease. Ehret 
et al. conducted a large-scale genome-wide association 
study of hypertension in 2011 and identified 10 novel loci 
related to BP physiology [6]. Although numerous com- 
mon genetic variants with small effects on BP have been 
identified [6-8], the identified variants account for only a 
small fraction of disease heritability [9]. One potential 
source of missing heritability is the contribution of rare 
variants. Recently, next-generation sequencing technolo- 
gyhas enabled the sequencing of the whole genome of 
large groups of individuals,which makes directly testing 
rare variants feasible. The Genetic Analysis Workshop 18 
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(GAW18) data, which consists of a whole genome 
sequencingdata set, is a large-scale pedigree-based sam- 
ple with 959 individuals, 464 directly sequenced and the 
rest imputed. 

Several statistical methods have been proposed to 
detect associations of rare variants, including the com- 
bined multivariate and collapsing (CMC) method [10] 
and the weighted sum statistic (WSS) [11]. We have pro- 
posed a novel test for measuringthe effect of an optimally 
weighted combination of variants (TOW) [12]. In addi- 
tion, based on the TOW, we proposed a variable weight - 
TOW (VW-TOW) aiming to test effects of both rare and 
common variants. Both TOW and VW-TOW are applic- 
able to quantitative and qualitative traits, allow covari- 
ates, and are robust to directions of effects of causal 
variants. 

In this article, we report a novel whole genome sliding 
window approach to detect genetic association between a 
trait and single-nucleotide polymorphism (SNP) regions 
across the entire genome. This approach integrates TOW 
and VW-TOW with the concept of sliding window [13]. 
Applied to the GAW18 replication 1, chromosome 3 data 
set, our approach yielded results consistent with the top 
genes influencing simulated SBP and DBP, which were 
generated from the GAW18 simulation model. 
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Methods 

Consider a sample of n individuals. Each individual has 
been genotyped at M variants in a genomic region. 
Denote Yi as the quantitative trait value. Denote 
Xj = (xn, ...,XiM) T as the genotypic score of the f indi- 
vidual, where X; m e {0, 1, 2} is the number of minor 
alleles that the i individual has at the w th variant. 

Suppose we have p covariates. Let (z,i, ...,Zip) T denote 
covariates of the i th individual. We adjust both trait value 
Yi and genotypic score Xi m for the covariates by applying 
linear regressions. That is, Yi = a o + oi\Zn + ... + a p Zip + e; 

and %im = a 0m + WlmZil + ... + 0/p m Zip + Tj m _ 

Let Yi an< i Xi m denote the residuals of Yi and Xi m , 
respectively. Denote Xj = (x,i, . . . , x;m) as the residuals 
of the genotypic score of the f h individual. 

Using the generalized linear model (GLM) to model the 
relationship between trait values and genotypes is equiva- 
lent to modeling the relationship between the residuals of 
trait values and the residuals of genotypes through 
GLM (1), where g() is a monotone "link" function. 



g(£(y,|X,)) = Po + pix n + • • • + PmXu 



(1) 



Under the GLM, the score test statistic to 
test the null hypothesis H 0 : p = 0 is given by 

U = E" =1 & ~ ?)& ~ h where U = (y, - - 1) 

and v = \ EL ft - EL & - - The 

statistic S asymptotically follows a chi-square distribu- 
tion with fe = ranfc(V) degrees of freedom (DF). For 
rare variants, however, the score test may lose power 
as a result of the sparse data and a large DFk. In rare 
variants association studies, to test for the effect of the 



weighted combination of variants, X; = ^ u> m Xi m , the 
score test statistic becomes 



m=l 



S{m, ■ ■ ■ ,wu) = n- 



(Eti (?■ - m - *)) 2 „ (E" =1 ■% Eli &• - m~ - i,„)Y 



E?.i(yi-f) EL EE-ift-y) EILi (*/-*) 



Because rare variants are essentially independent, we 
have 

n 2 M M n - - M n 2 

£ M ft -*) - i "™"''z3i., ft™ " '"'ft 1 " *>) 98 ft™ - *»> 



Let 



EL (Pi - FX*™ - *) 



r— ■ : — - an d Urn = w mJ^2 i=1 [Xim - X m ) 2 - 

Then, the score test statistic is approximately equal to 
S 0 {u>i,--- ,w M ) = n VL -h- 1 — ■ 

Em (.Yi-Y) Em=l"m 

As a function of [u\, ■ ■ ■ , Um), Sq(u>i, ■ ■ ■ u>m) reaches its 
maximum when u m = a m or »- -El, »-?>(*.-*.)/ El., (*--*.) 2 
(m = 1, • • • ,M). We denote as the optimal weight 



which is given by uf m = ^ - y)(i im - i m )/ (x im - i m ) . 

Let 5? = ^ M i w° m x im . Then «*-".<*>-»lX,«-fc«-*VlX,&-fc'- 
We propose the new test statistic TOW to test the effect 
of the optimally weighted combination of variants 

E^ , W °m x im as T r = E" , & ~ fiffi ~ *1 Tj iS e q uiva ' 

lent to So(u>,, • • • w° M ) since (y,- — y) is a constant. 

*—^i=\ 

The optimal weight w° m will put big weights to the variants 
that have strong associations with the traits of interest and 
adjust the direction of the association. Also, w° m will put 
big weights to rare variants. TOW targets rare variants 
and will lose power when testing for the effect of both rare 
and common variants. For testing the effects of both rare 
and common variants, we propose a new statistic, VW- 
TOW. We divide variants into rare (minor allele frequency 
[MAF] <the rare variant threshold [RVT]) and common 
(MAF > RVT), and apply TOW to the rare and common 
variants separately. 

Define the test statistic of VW-TOW as 
Tyw-T = mino<i<ipx, where px is the p value of T\. 
T T , T r and T c denote the test statistics of TOW for rare and 
common variants, respectively. Here, we evaluate the 
minimization by dividing the interval [0, 1] into K subin- 
tervals of equal-length. Let Xk = k/K for fe = 0, 1, • • • ,K. 
Then, min 0 <A<ipA = mm 0 < k <Kpx k . 

We use permutation tests to evaluate p values of both 
T T and Ty W _ T . To evaluate the p value of the test T T , let 
T° denote the value of the test statistic based on the ori- 
ginal data set. For each permutation, we randomly 
resample from residuals of trait values and denote the 
value of the test statistic based on the permuted data 
setby Tj er . We perform the permutation procedure 
many times. Then the p value of the test is the propor- 
tion of the number of permutations with Tj" r > T°. We 
permute B times of permutations to evaluate the p value 
of Tyw~T- Let an d denote the values of T r and 
T c based on the f>* permuted data, where b = 0 repre- 
sents the original data. Based on j 1 ' 6 ' and jjW 
[b = 0, 1, • • • , B), we can calculate T h lh for b = 0, 1, • • • , B 
and fe = 0, 1, • • • , K, where var (T r ) and var (T c ) are esti- 
mated using and j^' (b = 1, • • • ,B). Then, we trans- 
fer T?° to by JP) _ TloKlx! > T h\ where /() is 
the indicator function. Let p^' = min 0 <fe<j<p[^- Then the 

p value of Ty W -T is given by Ei=i KP^ ] < P^ ^ w here 

B 

I() is the indicator function. 

We use TOW and VW-TOW to analyze the data set 
of unrelated individuals of GAW18 replication 1 on 
chromosome 3. To apply TOW and VW-TOW to the 
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entire chromosome 3, we propose a sliding-window 
approach [13]. To use sliding windows, we divide all SNPs 
into contiguous windows and apply TOW and VW-TOW 
in each window. Suppose that we use windows with a win- 
dow size of S, then, all the SNPs can be divided into win- 
dows: 1 to S, S+l to 2S, 2S+1 to 3S, and so on. 

To analyze the data set of GAW18 replication 1, chro- 
mosome 3 for unrelated individuals, we set the window 
size as 20. First we performed quality control tests for the 
genotype data with the PLINK toolset. We used 
10,000,000 permutations to evaluate the empirical 
p values of TOW for DBP and SBP data, and 100,000 
permutations to evaluate the empirical p values of VW- 
TOW for DBP and SBP data. Becausethe sample of unre- 
lated individuals in GAW18 is relatively small, it is not so 
reasonable to claim the significance either by the false- 
discovery rate or by the Bonferroni-corrected threshold. 
Therefore, we recommend the top 10 most promising 
windows with the smallest p values for follow-up studies. 

Results 

We applied TOW and VW-TOW incorporating the 
sliding window approaches to analyze the hypertension 
unrelated individuals'data set of GAW18. To facilitate 
comparisons among GAW18 contributions, we analyzed 
only replicate 1 on chromosome 3. To evaluate type I 
error rates of TOW and VW-TOW, we used all 200 
replicates of simulated phenotype data. There are 157 
unrelated individuals in the GAW18 pedigree-based 
sample. Among the 157 individuals, 142 have observa- 
tions for SBP, DBP, and other demographic/clinical vari- 
ables at exam 1. Our analysis was based on the 142 
individuals and their genotypes, quantitative trait SBP, 
DBP, and other characteristicsat exam 1. 

The total genotyping rate in the 142 individuals is 
0.9997. We did not find any duplicated samples or 



sample contamination. No individual was filtered out 
from the multidimensional scaling (MDS) analysis. Of 
the 1,215,399 SNPs on chromosome 3, we removed 
251,892 completely missing SNPs and retained 963,507 
SNPs for final analysis. Because SBP and DBP varied by 
sex and increased with age, age and sex were considered 
as covariates in this study. 

We listed the top 10 most promising windows out of 
48,176 windows across the entire chromosome 3. The 
top 8 windows all reside in gene MAP4, which is the 
most susceptible gene on chromosome 3 for hyperten- 
sion. Seven of 9 top variants influencing DBP and 8 of 9 
top variants influencing SBP on chromosome 3 were 
found in or close to our top windows. Tables 1 and 2 
show the top 10 most promising windows by TOW that 
are associated with DBP and SBP, respectively. The p 
values of TOW in the top 3 windows of Table 1 are very 
small. SNP 3_47957996, 3_ 47956424, and 3_47957741 
are the third, fourth, and ninth variants in Table 2 of the 
GAW18 answer sheet. They all fell into our third window 
in Table 1 and the first window in Table 2. 

To evaluate the type I error rates of the proposed slid- 
ing window approach, we chose 100 blocks (20 variants 
in each block) from chromosome 3 that are far from 
causal variants. In each block, we applied TOW and 
VW-TOW to each of the 200 replicates to test associa- 
tion between genotypes and the trait DBP. We obtained 
1 p value for each replicate and each block. Figure 1 
shows the histograms of TOW and VW-TOW. The his- 
tograms indicate that the type I error rates of both 
TOW and VW-TOW are under control. 

Discussion 

In this article, we proposed a sliding-window-based opti- 
mal weighted approach to test for the effects of both 
rare and common variants across the whole genome. In 



Table 1 Top 10 most promising windows associated with DBP 


WID 


Chr 


Physical location 


Empirical p TOW 


Empirical Pvw-tow 


Gene 


Reference variants 


1 


3 


48117215,48121372 


2.34 X 10~ 7 


0.0005 


MAP4 




2 


3 


48063171,48068858 


4.95 X 1 0~ 7 


0.0005 


MAP4 




3 


3 


47957289,47961091 


4.09 X 1 0~ 6 


0.0006 


MAP4 


3_47957996 
3_47956424 
3_47957741 


4 


3 


4803405148040240 


1 .42 X 1 0~ 5 


0.001 


MAP4 


3_48040284 
3_48040283 


5 


3 


48089115,48094079 


2.06 X 1 0~ 5 


0.001 


MAP4 




6 


3 


48005035,48009105 


2.69 X 1 0~ 5 


0.0015 


MAP4 




7 


3 


47929938,47935009 


5.29 X 10~ 5 


0.001 


MAP4 




8 


3 


47912703,47920240 


9.06 X 10~ 5 


0.001 


MAP4 


3_47913455 


9 


3 


4474736,4477687 


0.036 


0.071 


SUMF1 


3_45008742 


10 


3 


56871312,56875674 


0.03 


0.058 


ARHGF3 


3_56870810* 



Chr, Chromosome; empirical p TOW , p value of TOW; empirical p m . m w, P value of VW-TOW; reference variants, top 55 variants influencing SBP and DBP in Table 2 
of the answers of GAW18; WID, window ID. 

•The variants are provided in the Supplemental Table 1 of the answer sheet of GAW18. 
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Table 2 Top 10 most promising windows associated with SBP 


WID 


Chr 


Physical location 


Empirical Pt OW 


Empirical Pvw. T ow 


Gene 


Reference variants 


1 


3 


47957289,47961091 


0.005 


0.004 


MAP4 


3_47957996 
3_47956424 
3_47957741 


2 


3 


48034051,48040240 


0.003 


0.007 


MAP4 




j 




A7QQ07R7 A7QQQ337 


U.U 1 


U.U 1 J 


hAAPA 




4 


3 


48040283,48046708 


0.02 


0.01 


MAP4 


3_48040284 
3_48040283 


5 


3 


47912703,47920240 


0.03 


0.017 


MAP4 


3_47913455 


6 


3 


48121395,48126740 


0.015 


0.032 


MAP4 




7 


3 


47929938,47935009 


0.025 


0.04 


MAP4 




8 


3 


48063171,48068858 


0.015 


0.01 


MAP4 




9 


3 


58104877,58108614 


0.01 


0.031 


FLNB 


3_58109162 


10 


3 


15664089,15667215 


0.039 


0.011 


BTD 


3_1 5686693 



Chr, Chromosome; empirical p T ow < P value of TOW; empirical Pvw-towi P value of VW-TOW; reference variants, top 55 variants influencing SBP and DBP in Table 2 
of the answers of GAW18; WID, window ID. 
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Figure 1 Histograms of p values for TOW and VW-TOW. 
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Figure 2 Power comparisons of TOW, CMC, VW-TOW, and WSS using DBP as phenotype measurement The numbers on the x axis refer 
to the 44 blocks of gene MAP4. 



each window, our recently developed TOW and VW- replicate 1, chromosome 3. We detected 3 susceptible 

TOW were applied to test genetic association between a windows across chromosome 3 for DBP and identified 

disease and a combination of variants. Then, we applied 10 out of 48,176 windows as the most promising win- 

the method to unrelated individuals of GAW18 on dows for DBP and SBP. Becausethis is a simulated 
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dataset, it is possible that the other genes identified were 
not listed in the top 10 windows but are actually related 
to SBP or DBP. 

In this study, we use each window of size 20 across 
the entire chromosome 3. How to choose an appropriate 
window size is a critical question. We evaluated the 
effect of window size by running window sizes at 30, 40, 
and 50, respectively. However, the power of TOW was 
not increased when using a larger window size. 
Although the power of VW-TOW was slightly increased 
when using a larger window size, no window can pass 
the entire chromosome 3 Bonferroni-corrected 
threshold. 

TOW and VW-TOW can be robust to population 
stratification by adjusting the first K principalcompo- 
nents (PCs) of genotypes at genomic markers as covari- 
ates when calculating the residuals of trait and of 
genotype matrix. In this GAW18 data analysis, we did 
not adjust for PCsbecausewe believed that population 
stratification was not severe in this data based on our 
MDS analysis. 

To further assess our new approach, we compared the 
power of TOW, VW-TOW, CMC, and WSS to detect 
association between gene MAP4 and DBP. The MAP4 
was split into 44 windows (blocks) with 20 variants in 
each window. In each window, we calculated the power 
of each method based on 200 replicates. The power 
comparisons based on phenotype measurement DBP are 
given in Figure 2. This figure shows that in most of the 
windows, TOW is the most powerful test; VW-TOW is 
the second most powerful test. 
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